Labeling audio-visual speech corpora and training an ANN/HMM audio-visual speech recognition system
نویسندگان
چکیده
We present a method to label an audio-visual database and to setup a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. The multi-stage labeling process is presented on a new audiovisual database recorded at the Institute de la Communication Parlée (ICP). The database was generated via transposition of the audio database NUMBERS95. For the labeling first a large subset of NUMBERS95 is used to achieve a bootstrap training of an ANN, which can then be employed to label the audio part of the audio-visual database. This initial labeling is further improved via readapting the ANN to the new database and reperforming the labeling. From the audio labeling then the video labeling is derived. Tests at different Signal to Noise Ratios (SNR) are performed to demonstrate the efficiency of the labeling process. Furthermore ways to incorporate information from a large audio database into the final audio-visual recognition system were investigated.
منابع مشابه
A hybrid ANN/HMM audio-visual speech recognition system
In this paper we present a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. To setup the system it was necessary to record a new audio-visual database. We will describe the recording and labeling of the database. The fusion of audio and video data is a key aspect of the paper. Three conditions, when only the audio or ...
متن کاملA Hybrid Ann/hmm Audio-visual Spee System
In this paper we present a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. To setup the system it was necessary to record a new audio-visual database. We will describe the recording and labeling of the database. The fusion of audio and video data is a key aspect of the paper. Three conditions, when only the audio or ...
متن کاملCzech audio-visual speech corpus of a car driver for in-vehicle audio-visual speech recognition
This paper presents the design of an audio-visual speech corpus for in-vehicle audio-visual speech recognition. Throughout the world, there exist several audio-visual speech corpora. There are also several (audio-only) speech corpora for in-vehicle recognition. So far, we have not found an audiovisual speech corpus for in-vehicle speech recognition. And, we have not found any audio-visual speec...
متن کاملAsynchrony modeling for audio-visual speech recognition
We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs. In this paper, we consider such models synchronized at the phone boundary level, allowing various de...
متن کاملDiscriminative training of HMM stream exponents for audio-visual speech recognition
We propose the use of discriminative training by means of the generalized probabilistic descent (GPD) algorithm to estimate hidden Markov model (HMM) stream exponents for audio-visual speech recognition. Synchronized audio and visual features are used to respectively train audio-only and visual-only single-stream HMMs of identical topology by maximum likelihood. A two-stream HMM is then obtaine...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000